BrCNGC gene family in field mustard: genome-wide identification, characterization, comparative synteny, evolution and expression profiling

CNGCs are ligand-gated calcium signaling channels, which participate in important biological processes in eukaryotes. However, the CNGC gene family is not well-investigated in Brassica rapa L. (i.e., field mustard) that is economically important and evolutionary model crop. In this study, we systematically identified 29 member genes in BrCNGC gene family, and studied their physico-chemical properties. The BrCNGC family was classified into four major and two sub phylogenetic groups. These genes were randomly localized on nine chromosomes, and dispersed into three sub-genomes of B. rapa L. Both whole-genome triplication and gene duplication (i.e., segmental/tandem) events participated in the expansion of the BrCNGC family. Using in-silico bioinformatics approaches, we determined the gene structures, conserved motif compositions, protein interaction networks, and revealed that most BrCNGCs can be regulated by phosphorylation and microRNAs of diverse functionality. The differential expression patterns of BrCNGC genes in different plant tissues, and in response to different biotic, abiotic and hormonal stress types, suggest their strong role in plant growth, development and stress tolerance. Notably, BrCNGC-9, 27, 18 and 11 exhibited highest responses in terms of fold-changes against club-root pathogen Plasmodiophora brassicae, Pseudomonas syringae pv. maculicola, methyl-jasmonate, and trace elements. These results provide foundation for the selection of candidate BrCNGC genes for future breeding of field mustard.


Results and discussion
Genome-wide identification of BrCNGC family. The CNGC genes play vital roles in development, ion transport, signaling and stress responses [11][12][13] , and the CNGC gene families have been studied in limited yet important crops [30][31][32][33] . However, the systematic identification and annotation of this family has not been performed in crucifer plants except for Chinese cabbage by our group recently 16 . The genomic sequence of B. rapa L., one of the most significant species of Brassica genus, was released in 2011 24 . Therefore, proper annotation and identification of the CNGC genes in B. rapa L. was performed in this study. All non-redundant putative gene sequences were retrieved from BRAD database, and analysed for the presence of plant CNGC-specific conserved domains and motifs. Consequently, accessions either having truncated sequences or missing CNGC-specific domains were discarded from further analysis. For instance, accession Bra022235, was a short truncated sequence lacking essential plant CNGC-specific domains such as CNBD 33 . Finally, twenty-nine genes with full length amino acid sequences (> 500 aa) were identified and confirmed as members of the BrCNGC family (Table 1). Each protein of the BrCNGC family comprised a fully conserved CNBD and IT domains, with overlapped CaMBD and adjacent IQ domains (Fig. 1a,c). Within the CNBD, the two most conserved regions were identified: a PBC motif, which binds the sugar and phosphate moieties of the cNMP ligand, and a "hinge" region adjacent to the PBC, which is believed to contribute to ligand binding efficacy and selectivity 17 . Moreover, the latest criterion for identification of CNGC genes is the validation of CNGC-specific motif key, which upon failing can mislead both the readers and researchers regarding the plant CNGCs including their classification and overall structure as a family. Using multiple sequence alignment at > 90% conservation, we deduced the BrCNGC-specific consensus motif key [ Phylogenetic analysis and classification of BrCNGCs. It is anticipated that homologs belonging to the similar taxonomic clade probably also resemble in structural, functional and evolutionary properties 34 . Such information can be used in clarifying the role(s) of the newly identified BrCNGCs. The multiple sequence alignment using full length amino acid sequences and conserved domains showed > 90% resemblance of the representative BrCNGCs among themselves, and with their respective orthologs in A. thaliana (i.e., AtCNGCs) and B. oleracea (BoCNGCs) ( Fig. 2; Supplementary Figs. S1-S4 and Tables S1−S2) 16 . Using neighbor-joining method, the BrCNGC gene family was classified into four main clades based on the classification of AtCNGCs, tree topology and bootstrap values ( Fig. 2; Supplementary Fig. S4). The member BrCNGC genes were named based on their positions in phylogenetic tree. Among these, seven BrCNGC genes clustered in clade-I, five in clade-II, and six in clade-III. Clade-IV that was additionally separated into two sub-clade (IV-a and IV-b), contained highest number of BrCNGCs genes (i.e., 11). These findings were in covenant to the previous investigations 1, 15 www.nature.com/scientificreports/ Chromosomal mapping and distribution on three sub-genomes. The 29 BrCNGC genes were unsystematically dispersed across the B. rapa L. genome and localized on nine of ten chromosomes (i.e., A01-07 and A09-A10). The distribution of BrCNGC genes on chromosomes was uneven, for example, chromosome A01 carried six genes, while others had 2-4 genes. None of BrCNGC genes was localized on chromosome A08. Among 29 BrCNGC genes, 15 loci were located on forward strand, while 14 loci were positioned on reverse strand of the chromosomes (Fig. 3). Similar to B. oleracea, the genome of B. rapa L. is currently fractioned into three sub-genomes: i.e., least fractionated (LF), medium fractionated (MF-I) and most fractionated (MF-II) 24 . The LF sub-genome of B. rapa L. contained maximum numbers of BrCNGC genes (i.e., 14 genes), while MF-II carried only 3 BrCNCG genes (Table 2). These findings are agreement to our previous findings of BoCNGC subgenomes 16 .
Evolution of BrCNGC family. Origin and comparative synteny analysis of BrCNGC family genes. B. rapa L. is an ancient polyploid, whose genome has undergone whole genome triplication (WGT) event ~ 13-17 million years ago (MYA), after divergence from A. thaliana, followed by large-scale re-diploidization (chromosomal re-arrangements) 35 Table 2). We found that more than > 80% of BrCNGC genes are located in wellconserved syntenic blocks, with deletion and gain of some genes, which coincides with the previous findings 39 . Compared with the ancestral Brassicaceae blocks (A to X) in A. thaliana, the synteny of 75% of the CNGC gene family was preserved in Brassica species, based on the number of corresponding genes. Ten of the 20 AtCNGC genes were retained as single copy in the equivalent blocks of both Brassica species. Three AtCNGC genes (i.e., AT2G23980, AT2G24610 and AT5G54250), located on I and W syntenic blocks, were preserved as two copies  Bra034281  A04  11980216  11982791  +  cNMP  IT  CaMBD  IQ  G1  12   BrCNGC2  Bra003323  A07  15879616  15883454  -cNMP  IT  -IQ  12   BrCNGC3  Bra004537  A05  687357  690331  -cNMP  IT  -IQ  3   BrCNGC4  Bra031515  A01  16651616  16656087  +  cNMP  IT  -IQ  3   BrCNGC5  Bra000937  A03  14054247  14058116  -cNMP  IT  CaMBD  IQ  13   BrCNGC6  Bra003081  A10  5414086  5416746  -cNMP  IT  CaMBD  IQ  www.nature.com/scientificreports/ in Brassica genomes, which were asymmetrically fractionated into three sub-genomes. Two AtCNGC genes (i.e., AT3G17690 and AT3G17700) in F syntenic block were retained as three copies in each species. Two BrCNGC genes (i.e., BrCNGC1 and BrCNGC24) were respectively located on conserved syntenic block with BoCNGC3 and BoCNGC23, but not with AtCNGC genes. An extra gene copy (Bra022235) was located on potential overlap/ tandemly repeated regions of F block along with gene pair BrCNGC26 and BrCNGC29 (Table 2). Thorough examination revealed that this gene has lost its functional CNBD domain during the course of evolution. These results are agreement to the findings of Duan et al. 40 who reported that functionally redundant gene copies are lost after genome duplication event, while functionally important some gene copies are retained. Together, these finding suggest that WGT, along with segmental duplication played important role in expansion of BrC-NGC gene family overall, while, tandem duplication was identified to play role in expansion of group IV-b only. Moreover, conservation of CNGC genes after substantial genome reshuffling event suggests that these genes are crucial for plant development 41 .
Gene duplication events and expansion of BrCNGC family. Gene family expands through one of three possible mechanisms including tandem and segmental duplication, and/or whole-genome duplication 42 . The examina-   Fig. S6). Utmost of the phase-1 and 2 introns were existing in AtCNGC genes, inferring that intron loss for the duration of evolution caused in a reduction in the number of introns in BoCNGC genes, principally those in clade I-III and IV-a ( Supplementary Fig. S7). Ten conserved motifs were identified in BrCNGCs during motif structure studies Multiple Expectation Maximization for Motif Elicitation suite (MEME) 44 . Rendering to Pfam codes 45 and WebLogos, only six motifs (i.e., 1-4, 7, and 8) comprised domains with known functions (Fig. 4, Supplementary Fig. S8 and Table S3). Motif 1 was the biggest motif accompanying with product of unknown functions. Motifs 2, 3, 7, 8 and 4, which encode a CNBD, an ion transport domain, and IQ domain, correspondingly, were preserved among all BrCNGC family members. Notably, each clade members had similar arrangement of functionally annotated motifs, reveals that the directly associated proteins in each clade showed alike motif arrangements and perhaps functional resemblances too. The functionality of the leftover motifs (1, 5, 6, 9 and 10) wait for additional experimental evidence.
Protein sequence features and physico-chemical properties of BrCNGCs. The biochemical and physiological characteristics of the 29 BrCNGC proteins were identified ( Table 3). The ProtParam tool showed that most of these proteins are localized in plasma-membrane. BrCNGC proteins varied in lengths from 556 to 786 aa with average of 711 aa, molecular weight (64.27-90.34 kDa), and residue weight (112.566-116.172 g/mol depending on the number of atoms present. Approximately, one-thirds of the BrCNGC proteins had low net charge (< 19) and relatively low isoelectric points (pI < 9). Approximately, all BrCNGC were hydrophilic, with BrCNGC21 and BrCNGC23 being somewhat hydrophobic. Based on the aliphatic index, most BrCNGC proteins were thermo-   (Table 3). Additionally, the BrCNGC proteins had more positively charged residues than negatively charged residues ( Supplementary Fig. S9). Hydrogen was the most abundant, followed by carbon, nitrogen and oxygen, and sulfur ( Supplementary Fig. S10). Leucine was a very common amino acid among the 26 BrCNGC proteins ( Supplementary Fig. S11).

Distribution of Post-translational modifications and microRNA target sites in BrCNGCs.
Post-translational modifications (PTMs) of protein upturn the variety of their functions over and done with diverse mechanisms 46 . These mechanisms may include, protein localization, protein-protein interaction, cleavage, degradation or allosterically regulating enzyme activity 47 . We analysed BoCNGC protein sequences using ScanProsite 48 , multiple putative phosphorylation sites were identified (Table 4). These locations may act as substrates for numerous kinases, comprising tyrosine kinase, casein kinase II, cAMP/cGMP kinase, and protein kinase c. All proteins contained non-potential Glycosylphosphatidylinositol (GPI) anchor modification site in their sequences, while 16 BrCNGCs contained PEST-like sequences, which may act as a signal peptides for protein degradation 49 . Most abundant sites were casein kinase II sites, with 17 sites in BrCNGC7, followed by protein kinase C, were the maximum in clade IV members. All BrCNGC proteins had multiple N-glycosylation/ N-myristoylation motif locates are greatly preserved than rest of the PTMs. The rest of the PTM sites, such as those for amidations, leucine zipper patterns, and P-loop of the -GTP/ATP binding site motif A, were less preserved and arbitrarily dispersed, increasing diversity to function and mechanisms of CNGC-definite PTMs 47 . MicroRNAs (miRNAs) are interior non-coding RNAs that direct gene expression, particularly post-transcriptional gene silencing 50 . Recognising the targets of the expected miRNAs could facilitate the understandings of the genetic functions of miRNAs prompting signal transduction, stress adaptations, and plant development 51 . Herein, we investigated for possible miRNA targets in the set of recognized BrCNGC transcripts 52 . We recognized 92 miRNAs comprising target sites  Table S4). Small RNA/target site paired with an expectation score and cut-off threshold of 4 were included to reduce the number of false positive predictions. Consequently, seventeen miRNAs with target sites in fourteen BrCNGC genes were recognized, among which, four miRNAs with an expectation score < 3.5 can be considered more reliable (< 3.5) (Supplementary Table S5). Most of the BrCNGC genes included target site for single miRNA, except for BrCNGC14, BrCNGC20 and BrCNGC21, which contained target sites for 2 miRNAs. The convenience of the target site wide-ranging from 8.828 (bra-miR9552b-5p) to 20.9 (bra-miR160a-3p), where minor values resemble to a grander likelihood of interaction between the target site and miRNA 53 . Eleven miRNAs were found to be participated in cleavage of the target transcript, although six miRNAs supposedly inhibit the translation of target genes. These miRNAs were previously identified as novel or conserved miRNAs by Yu et al. 54 and Jiang et al. 55 in B. rapa L.and B. comparstis ssp. chinensis, respectively. Former research has shown that some of these miRNA families are greatly preserved in Brassicaceae or other plant species, located and expressed in leaves, pollen, roots or flower, with ancient functions in heat stress response (bra-miR5726, bra-miR5712 and bra-miR5716) 54,56 , regulation of target genes related to plant development (i.e., bra-miR156a/b/d-3p, bra-miR824, and bra-miR391-5p) 55 , somatic embryogenesis in Dimocarpus longan 57 , Brassica-specific hormone signal transduction pathway (i.e., bra-miR162-3p), drought stress tolerance in tomato (i.e., miR160a and miR9552b) 58 and response to Turnip mosaic virus (i.e., bra-miR1885a and bra-miR5717) 57 . The function of the remaining novel and conserved miR-NAs is not known yet, which requires further experimental elucidation.
In-silico functional relationship network of BrCNGC proteins. A theoretical protein-protein interaction was constructed with the STRING program to recognise the relations among unlike BoCNGC proteins 59 . The interaction network of first shell of interactors presented that thirteen BrCNGCs were part of various protein-protein interaction networks (Supplementary Fig. S12). Among these, seven proteins, namely BrCNGC2, 14-18 and interact with ubiquitin3 protein (Bra009542), detected by Affinity Capture-MS assay. It is reported that Polyubiquitin chain upon covalent binding to target protein governs proteolysis, DNA damage tolerance and other processes 60 . In another association, BrCNGC29 interact with Constitutive Photomorphogenic 1, experimentally

Functional analyses of BrCNGCs by transcriptome-based expression profiling. Expression patterns in different plant parts and wounding stress.
Scrutinising the steady-state expression patterns of BrCNGC genes in six tissues (i.e., root, stem, flower, silique, leaf, and callus) was performed via Illumina RNA-sequencing data from the Gene Expression Omnibus (GEO) database database. Out of the 29 BrCNGCs, fifteen were expressed at moderately high levels (fragments per kilobase of exon model per million mapped reads value > 1) in at least one tissue, including ten in silique, eleven in calli, twelve in the roots and stem, and fourteen in leaves and flowers. The remaining genes either displayed lowest transcript accumulation or did not express in any tissue ( Fig. 5; Supplementary Table S6). An additional investigation revealed that BrCNGC21 was the highest expressed genes, particularly in flowers and silique, suggesting they may be vital for Brassica species development.
Amongst the other genes, BrCNGC4 was greatly expressed in leaves, BrCNGC7 in stem and roots, although BrCNGC16 was greatly expressed in calli. Greater expression in silique and calli suggest the expression of these genes is induced by wounding. Our data suggest that BrCNGC genes in different tissues expressed differently, and that several genes are induced by wounding 22 . Highly expressed genes in certain tissues indicated some functional preservation, while others showing functional dissimilarities 62,63 .   64 . Methyl jasmonate (MeJA) is one such plant hormone that is used in diverse developmental pathways and defense in plants 65 . We determined the expression profiles of 29 BrCNGCs in the leaves of B. rapa, exposed to 0.2 mM of MeJA (Supplementary Table S7). The calculated fold-change data showed that fourteen genes were up-regulated at 8-10th leaf stage, seven genes were down-regulated, while the remaining genes didn't show low transcript abundance compared to control (Fig. 6a). Among these, BrCNGC13 showed maxim level of expression, which was up-regulated > 5.8-fold compared to unstressed control. On other hand, BrCNGC18 showed maximum negative response, which was-ninefold down-regulated compared to control. This pattern was followed by BrCNGC25 and BrCNGC29 respectively. These results indicated that the transcriptional responses of CNGCs along with other signal transduction pathway genes are regulated by MeJA 66,67 .
Expression patterns in response to bacterial pathogen and elicitor stress. Phytoalexins are antimicrobial substances produced by plants to elicit resistance against pathogen infection 68 . Most of the phyloalexin biosynthesis pathways are reported to be conserved across the B. rapa L. cultivars, Chiifu and Rapid Cycling (RCBr). Using illumina RNA-sequencing, Klein et al. 69 observed that some of phyloalexin biosynthesis pathways are activated  Supplementary Table S8. Most of the BrCNGC genes were expressed at higher levels after 9 h post-infiltration, including twenty-two genes in response to flg22, and twenty-one in response to Psm (Fig. 6b). Among these, > 18 BrCNGC genes were mutually expressed under both treatments, four expressed differentially, while seven genes didn't show any expression compared to uninfected controls. Compared with their mock treatments, the expression of ten genes was increased and eleven decreased in response to Psm. The maximum responses were noted for BrCNGC27 (> tenfold up-regulation) and BrCNGC20 (-sixfold down-regulation), respectively. On other hand, the expression of thirteen genes was increased and nine decreased in response to flg22, with notable responses shown by BrCNGC12 (> sixfold up-regulation) and BrCNGC19 (i.e.,-7.2-fold down-regulation) respectively. The results showed that three duplicated gene pairs (i.e., BrCNGC-22 /27, 25/28 and 26/29) has similar expression trend (Fig. 6b). These results indicate that various CNGCs may be involved plant defense against bacterial pathogens 34 .
Expression patterns in response to clubroot pathogen Plasmodiophora brassicae. Plasmodiophora brassicae is among the most common pathogens worldwide, which cause clubroot disease in Brassica crops 70 . In a latest study, Chen et al. 71 Table S9). The missing profiles of the remaining three genes (i.e., BrCNGC2,8,and 14), might be due to no expression at all, or these genes had spatial and temporal expression patterns 35 . As shown in fig.ure6c Expression patterns in response to trace elements stress. Trace elements are essential for human nutrients to fulfill their metabolic requirements 72 . Among these trace elements, iron (Fe) and zinc (Zn) are mainly significant, because their deficiency cause serious health and nutritional problems in human population 73 . On the other hand, Cadmium (Cd) is a toxic element found in the soil, which cause severe toxicity in plants, animals and humans 74 . It is documented fact that the excess of zinc intake also cause toxicity, which can be more harmful to the plants, compared to Zn deficiency 75 Table S10). Compared to control, seven genes were up-regulated under CdE, eight under FeD, twelve under ZnD, and eight genes were up-regulated under ZnE condition, respectively (Fig. 6d). On the contrary, nine genes were downregulated under CdE, eight under FeD, six under ZnD, and eight genes were down-regulated under ZnE, respectively. Some of the multi-copy genes, such as BrCNGC26 and BrCNGC29, showed similar trend under ZnD stress, while other gene pairs exhibited differential patterns. These observations are agreement to the findings of Li et al. 76 . The data showed that some of BrCNGC family genes are definitely involved in trace elements response, and further experiments will clarify their individual roles and help in improving environmental adaptability in B. rapa L.

Methods
Genome-wide identification of CNGC proteins. The identify CNGC gene family in B. rapa L., the protein sequences of twenty Arabidopsis CNGCs were collected from TAIR10 77 and BLAST searched against target proteomes in BRAD database 78 , using built-in BLASTP search. The matching protein sequences of target species were retrieved and analyzed in SMART 79 , Pfam 80 and Motif search service on GenomeNet for domain analysis. Finally, the target protein sequences comprising cNMP-binding (IPR000595) and ion transport (PF00520) domains were recognized as candidate CNGCs and manually checked for the presence plant CNGC-specific consensus motif within the cNMP-binding region 14 . The newly identified CNGC genes were named according to standard nomenclature (i.e., taxonomic initials such Br for B. rapa L.) and phylogenetic positions.
Multiple sequence alignment and phylogenetic analysis. ClustalX  Chromosomal mapping, gene duplication and syntenic analysis. The positional information from BRAD database was used for genomic mapping of CNGC genes on B. rapa L. chromosomes by using R script. The tandem and segmental duplications were analyzed by PGDD 85 and PTGBase 39 . The synteny relationship between BrCNGCs, AtCNGCs and BoCNGCs were assessed in Bolbase 86 , and mapped in a circos plot using R studio 87 .
Conserved motif composition and Gene structure. To predict the gene structures, we used Gene Structure Display Server (GSDS 2.0) 88 . To find conserved motifs in the CNGC protein sequences, we used the MEME and MAST motif discovery tools with default parameters 44 . The annotation of the motifs were performed in Pfam program 45 .
The miRNA target sites and protein-protein interaction. The microRNA sequences of B. rapa L.
were collected from miRBase database 89 and submitted to psRNATarget server 52 for miRNA's target sites prediction within the BrCNGC genes. Each of these miRNAs were searched online to find their experimental proof, function and related literature. The protein-protein interaction of BrCNGC proteins was constructed in STRING v10 59 , by using the CNGC protein sequences as reference.
Data sources and expression of BrCNGC genes. For expression profiling of BrCNGC genes in different plant tissues, the RNA-seq data placed in GEO database (GSE43245) was used 38,86 . For gene expression against different stress treatments, the expression data (GSE69785) of 15 days old RCBr plants infiltrated with Psm and flg22 69 , GSE74044 for expression in the roots of 30-day old NILs at 0, 12 , 72 and 96 h after inoculation of P. brassicae 71 , GSE51363 for expression in the leaves of B. rapa L. subsp. pekinensis, exposed to 0.2 mM MeJA at 8-10 leaf stage, and GSE55264 for expression in the leaves of 14 days plants exposed to Fe deficiency (0.05 µM; Normal = 3 µM), Zn deficiency (0.005 µM; Normal = 2 µM), Zn excess (50 µM; Normal = 2 µM) and Cd excess (1 µM; Normal = without Cd) for 7 days was used. Transcript abundance was calculated as FPKM and the values were log2 transformed. Data were plotted in heat maps generated in R studio 90 . For abiotic and biotic stress, we used a fold-change method, where the threshold of ≥ 0 defines a gene as "positively expressed/up-regulated" and threshold ≤ 0 as "negatively expressed/down-regulated", compared to FPKM values in control treatments.

Conclusion
This work is the first wide-ranging and systematic study of CNGC gene family in B. rapa L. This work identifies and fills the remaining gaps in literature, and present a clearer picture about plant CNGCs in general, and crucifers in particular. Here, we have tried to explore each and every aspect of BrCNGC gene family, from genes to protein, including gene structure, motif composition, miRNA target sites, post-translational modification sites, protein interaction network, GO-term prediction and orthologous relationship etc. The phylogenetic and synteny analyses will help in understanding the evolutionary patterns, and diversification and/or expansion of CNGC family genes in complex ancient polyploids (e.g., B. rapa/B. oleracea), whose genomes have undergone multiple duplication and reshuffling events. Additionally, this work will contribute to further clarify the functions of differentially expressed candidate BrCNGC genes through cloning, and to investigate their roles in the regulation of cascade pathways, plant development and stress tolerance in B. rapa L.

Data availability
The raw sequence datasets generated and analyzed during the current study are available through BRAD database (http:// brass icadb. org/ brad/). The gene expression data analyzed during the current study are available at GEO database with accession numbers: GSE43245, GSE69785, GSE74044, GSE51363 and GSE55264.